Development of phenotyping algorithms for hypertensive disorders of pregnancy (HDP) and their application in more than 22,000 pregnant women

Recently, many phenotyping algorithms for high-throughput cohort identification have been developed. Prospective genome cohort studies are critical resources for precision medicine, but there are many hurdles in the precise cohort identification. Consequently, it is important to develop phenotyping algorithms for cohort data collection. Hypertensive disorders of pregnancy (HDP) is a leading cause of maternal morbidity and mortality. In this study, we developed, applied, and validated rule-based phenotyping algorithms of HDP. Two phenotyping algorithms, algorithms 1 and 2, were developed according to American and Japanese guidelines, and applied into 22,452 pregnant women in the Birth and Three-Generation Cohort Study of the Tohoku Medical Megabank project. To precise cohort identification, we analyzed both structured data (e.g., laboratory and physiological tests) and unstructured clinical notes. The identified subtypes of HDP were validated against reference standards. Algorithms 1 and 2 identified 7.93% and 8.08% of the subjects as having HDP, respectively, along with their HDP subtypes. Our algorithms were high performing with high positive predictive values (0.96 and 0.90 for algorithms 1 and 2, respectively). Overcoming the hurdle of precise cohort identification from large-scale cohort data collection, we achieved both developed and implemented phenotyping algorithms, and precisely identified HDP patients and their subtypes from large-scale cohort data collection.


Development of hypertensive disorders of pregnancy (HDP) phenotyping algorithm 1
We developed HDP phenotyping algorithm 1 according to the American College of Obstetricians and Gynecologists (ACOG) guidelines 23 .Algorithm 1 detects HDP patients based on hypertensive disease history, blood pressure (BP), proteinuria (PU), timing of onset and PE-related clinical conditions (see "Details of the identification rules for the HDP subgroups in algorithm 1" section in the Supplemental Methods).
PU in early pregnancy is caused by several possibilities both related and unrelated to HDP, such as some degree of renal insufficiency and chronic kidney disease (CKD).To avoid overestimating the HDP group, we implemented rules to exclude subjects with PU in early pregnancy from the HDP group.

Development of hypertensive disorders of pregnancy (HDP) phenotyping algorithm 2
We developed HDP phenotyping algorithm 2 according to the guidelines of the Japan Society of Obstetrics and Gynecology (JSOG) (see "Details of the identification rules for the HDP subgroups in algorithm 2" section in the Supplemental Methods).In algorithm 2, hypertensive subjects without PU were assigned to SPE or PE when the following conditions were met: maternal organ dysfunction (see Supplementary Table 1) or light-for-date.(see "Selection of subjects with light-for-date" section in the Supplemental Methods).The patients having maternal organ dysfunction were determined by analyzing clinical notes (see "Selection of subjects with maternal organ dysfunction" section in the Supplemental Methods).More details about the process used to reach the target conditions are provided in the "Details of the process used to analyze unstructured clinical notes" section in the Supplemental Methods.

Implementation of the developed phenotyping algorithms
The developed phenotyping algorithms according to American and Japanese clinical guidelines were implemented by Python 3.8.10 and Perl 5.16.3 (https:// github.com/ ogish imalab/ HDP_ pheno typing_ algor ithms).The implementations take no arguments and take both data for phenotyping and the lists of subjects who have specific phenotype about exclusion and inclusion criteria of HDP and their subtypes.The lists were required to prepare before phenotyping by analysis of clinical notes performed by such as pattern matching based on rules (see "Details of the processes used to analyze unstructured clinical notes" and "Selection of subjects with lightfor-date" in Supplemental Methods).

Data sources used to apply the developed algorithms
We selected research subjects from approximately 23,000 pregnant women who were included in the BirThree Cohort study, which is a population-based prospective cohort study.We excluded subjects who did not have medical records at delivery, who did not have BP and PU data, who did not have a live birth and who withdrew from the study.Using these criteria, we included 22,452 research subjects in the dataset.The baseline characteristics of all pregnant women in the BirThree Cohort study are provided in a previous report 38 .

Comparison of phenotyped subgroups and diagnoses
We conducted a clinician chart review to compare the subgroups phenotyped by the algorithms with the diagnoses.The comparison was performed to evaluate the consistency of clinical concepts between the developed algorithms and the clinical guidelines.A clinician chart review was performed based on the American guidelines 23 for 252 subjects who met the following criteria: (1) subjects recruited at Tohoku University Hospital between September 2015 and November 2016 and (2) subjects who were able to access their medical records.The clinician chart review was conducted by two obstetricians, a mid-career and a senior obstetrician.If the subgroups assigned by the two doctors differed, the subgroups were determined by discussion.The chart review was performed blindly and annotated in a binary manner for each subgroup of HDP, CH and normotensive.The baseline characteristics of the 252 subjects are shown in Supplementary Table 4.
We calculated positive predictive values (PPVs), negative predictive values (NPVs), accuracies, sensitivities and specificities to compare our algorithms against the diagnoses.To calculate sensitivity and specificity, we identified true positive (TP), true negative (TN), false positive (FP), false negative (FN) by comparison of identified subgroups between phenotyping algorithms and diagnosis of 252 subjects.Based on the four indices, we identified sensitivity and specificity as TP/(TP + FN) and TN/(TN + FP).We identified accuracy using scikit-learn in python3.The diagnosed subgroups of the 252 subjects are listed in Supplementary Table 5.In this study, there were differences in data coverage between the cohort data for phenotyping and the diagnosis dataset that only diagnosis dataset included both intrapartum and postpartum data and ultrasound measurements, in addition to cohort data (right side of Fig. 1).The all items used for the diagnoses are listed in Supplementary Table 6.

Ethics declarations
This study was approved by the ethical committee of Tohoku Medical Megabank Organization, Tohoku University.We confirm that all methods took place in accordance with relevant guidelines and regulations, under informed consent from all participants.

The developed HDP phenotyping algorithms
The rules of the developed HDP phenotyping algorithms 1 and 2 are shown in Fig. 2a and b, respectively.The detailed results of the identification of subjects with a hypertensive disease history and maternal organ dysfunction by analyzing unstructured clinical notes are described in the "Details of the identified subjects with a hypertensive disease history and maternal organ dysfunction" section in the Supplemental Results.The detailed numbers of HDP subgroups phenotyped by the developed phenotyping algorithms are described in "The number of phenotyped subgroups of HDP" section.

The number of phenotyped subgroups of HDP
Of the 22,452 research subjects, 1781 (7.93%) were phenotyped as having HDP by phenotyping algorithm 1.Among the subgroups of HDP, 890 subjects (3.96%) had GH, 304 (1.35%) had SPE, and 587 (2.61%) had PE.According to phenotyping with algorithm 2, 1813 (8.08%) subjects had HDP.Among the subgroups of HDP, 825 subjects (3.67%) had GH, 336 (1.50%) had SPE, and 652 (2.90%) had PE (Table 1).Compared with algorithm 1,  fewer subjects had GH or CH, while more had PE or SPE according to algorithm 2 because the algorithm 2 classifies hypertensive subjects without PU into PE or SPE based on the criteria such as subjects having epigastralgia, HELLP syndrome, eclampsia and FGR, in addition to the criteria of algorithm 1.
The causes of the differences in the phenotyped HDP subgroups between algorithm 1 and 2 are shown in Supplementary Table 8.The sum of the subjects was not equal to 97 because one person may have had multiple symptoms and differences in the timing of the symptoms.

Comparison of the subgroups identified by HDP phenotyping algorithms with diagnoses
Of the 252 subjects included in the diagnosis dataset, 36 (14.2%) were diagnosed as HDP.The subgroups of the 252 subjects are shown in Supplementary Table 5.The performance of phenotyping algorithm 1 and 2 compared to the diagnoses is shown in Tables 2 and 3, respectively.Comparisons for GH EO were not applicable because none of the 252 subjects was diagnosed with GH EO.
For phenotyping of HDP overall, the PPVs, NPVs, accuracies, and specificities of both algorithms 1 and 2 were high (0.96, 0.96, 0.96, and 1.00 for the PPVs, NPVs, accuracy and specificity of algorithm 1 and 0.90, 0.96, 0.95, and 0.99 for the PPV, NPV, accuracy and specificity of algorithm 2, respectively), and the sensitivities were modest (0.72 and 0.75 for algorithm 1 and 2, respectively).Of the 252 subjects in the diagnosis datasets, 15 and 16 subjects were classified into different subgroups by algorithm 1 and algorithm 2, respectively.Of the 15 subjects with different subgroups by algorithm 1, 10 of the differences were caused by missing intrapartum and postpartum data, 1 was caused by a missing creatinine level, 1 was caused by transcription error and 3 were caused by headache that was not accepted as an unexplained new-onset headache that was unresponsive in the chart review (see Supplementary Table 9).
Of the 16 subjects who showed differences by phenotyping algorithm 2, three were new because the subgroups phenotyped by algorithm 1 matched the diagnosed subgroups (see Supplementary Table 10).

Discussion
In this study, we developed two HDP phenotyping algorithms, algorithm 1 and 2. Algorithm 1 identifies HDP patients based on the hypertensive disease history, BP, PU, timing of onset and PE-related clinical conditions.In addition to the factors included in algorithm 1, algorithm 2 conducts phenotyping by considering maternal organ dysfunction and light-for-date, which originate from placental dysfunction in early pregnancy 39 .In these phenotyping results, 97 (0.44%) subjects were phenotyped into different subgroups by the two algorithms, and they actually had clinical conditions related to maternal organ dysfunction, such as epigastralgia and light-for-date  www.nature.com/scientificreports/(Supplementary Table 8).The choice of which algorithm to use depends on whether the phenotyping will be performed with the clinical concepts of the American or Japanese clinical guidelines used in algorithm 1 and 2, respectively.The Japanese clinical guidelines of HDP have been repeatedly updated in the past 35,36,40,41 and will be continually updated according to the progress of research to understand the pathogenesis of HDP.Therefore, algorithm 2 will be updated continuously according to updates to the Japanese guidelines.
In this study, the number of identified subjects with a hypertensive disease history (115; 0.51%) and underlying hepatic or renal disorders (323; 1.44%) by analysis of unstructured clinical notes was slightly smaller than that in previous reports 42,43 .This result suggests the possibility of the underestimation of subjects with these conditions because of differences between the diagnoses and the self-reported interviews, which were the sources of the transcribed data regarding disease history.
Of the 22,452 subjects in the BirThree cohort study, the proportions of the phenotyped HDP patients by algorithms 1 and 2 were almost identical to the previously reported prevalence rate of HDP 38,44,45 .
The prevalence of HDP in the diagnosis dataset was 14.29%, which was higher than that in the full cohort (Supplementary Table 5).The baseline characteristics of the diagnosis dataset were higher risk of HDP in terms such as BMI and age than the full cohort (Supplementary Table 4).These differences may have occurred because the subjects in the diagnosis dataset were recruited at Tohoku University Hospital, which is a leading hospital that actively accepts high-risk cases, including patients with HDP.
In the evaluation of the performances of our developed algorithms, our phenotyping algorithms showed high PPVs, NPVs, accuracies and specificities and modest sensitivities, despite the limitation of data availability of intrapartum and postpartum data (Table 2, 3).In the results comparison between algorithm 1 and diagnosis, 15 of the 252 subjects in the diagnosis dataset were phenotyped into different subgroups of HDP by algorithm 1 (Supplementary Table 9).All of these 15 differences will be resolved by linking EHRs and cohort data 19 because all differences were caused by lacking intrapartum and postpartum data and detailed description about headaches in cohort data.In the comparison for algorithm 2, 16 of 252 subjects were phenotyped into different subgroups of HDP (Supplementary Table 10).In particular, three of the sixteen differences are caused by fit to criteria of algorithm 2 added to algorithm 1, and these are not included in the diagnostic criteria.These results show that the major cause of miss-phenotyping was not inadequacy of phenotyping algorithms but differences of data sources between phenotyping and diagnosis.Based on these results, we concluded that there was consistency in clinical concepts between phenotyping algorithms and clinical guidelines.Furthermore, these results showed that our phenotyping algorithms are ready to be applied in HDP phenotyping tasks.
We used the transcribed medical records of the BirThree Cohort study as data sources for the phenotyping algorithms.The advantage of using the BirThree Cohort study data is the high representativeness of over 70% of the pregnant women in the regional population during the study period.This study is premised on the representativeness of the data sources because we evaluated the performances of our algorithms by comparing the number of subjects having HDP between the study population and the prevalence of HDP in previous reports, and this was large enough to compare the proportion of phenotyped subgroups of HDP with the prevalence rates.

Conclusions
We developed two phenotyping algorithms to identify subjects with HDP and their subtypes from large-scale cohort data collection to perform further biomedical analyses.Our algorithms were developed according to American and Japanese guidelines that differ in the criteria for HDP and its subtypes.Compared to the diagnosis, our phenotyping algorithms showed high PPVs (0.96 and 0.90 for algorithm 1 and 2, respectively), NPVs (0.96 for algorithms 1 and 2), accuracies (0.96 and 0.95 for algorithm 1 and 2, respectively) and specificities (1.00 and 0.99 for algorithm 1 and 2, respectively), and modest sensitivities (0.72 and 0.75 for algorithm 1 and 2, respectively), despite the limitation of data availability for intrapartum and postpartum data in the cohort collection.The application of our phenotyping algorithms to large-scale cohort data will substantially accelerate HDP research by facilitating the high-throughput and precise identification of patients with HDP.

Figure 1 .
Figure 1.Overview of the sources of the cohort data collection and diagnosis dataset.The left side represents data from the cohort that consist of an interview conducted at the first visit, prenatal checkups and records at delivery.The right side represents the diagnosis dataset from the medical record used in validation.The diagnosis dataset consisted of both intrapartum and postpartum data and ultrasound measurements, in addition to cohort collection.

Figure 2 .
Figure 2. The developed rule-based phenotyping algorithms.(a) The rules of the developed HDP phenotyping algorithm 1. Phenotyping by algorithm 1 was conducted based on hypertensive disease history, BP, PU, timing of onset and PE-related conditions.The black boxes represent the conditions of rules, the arrow diagrams represent IF/THEN rules, and the orange boxes represent phenotyped subgroups.(b) The rules of the developed HDP phenotyping algorithm 2. The rules of the developed HDP phenotyping algorithm 2. Phenotyping by algorithm 2 was conducted with maternal organ dysfunction and light-for-date in addition to the factors included in algorithm 1.

Table 1 .
Number of subgroups phenotyped by the phenotyping algorithms.

Table 2 .
The performance of phenotyping algorithm 1.

Table 3 .
The performance of phenotyping algorithm 2.